simon-5502-08-slides

Topics to be covered

  • What you will learn
    • A simple example of survival data
    • Overall Kaplan-Meier curve
    • The log rank test
    • The hazard function
    • The Cox regression model
    • Assumptions and data management

Survival analysis

  • Time to event models
    • Death
    • Relapse
    • Rehospitalization
    • Failure of medical device
    • Pregnancy
  • Not every patient experiences the event
    • These are censored observations

First fruit fly experiment, 1

data_dictionary: fly1.txt
description: |
  This dataset provides a simple example of what survival and censoring. It provides an inuitive explanation of estimation of survival probabilities.
vars:
  day:
    label: Time until death
    unit: days

First fruit fly experiment, 2

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, 70, 71, 72, 73, 75, 77, 79, 89, 94, 96

First fruit fly experiment, 3

  day   p
1  37 96%
2  40 92%
3  43 88%
4  44 84%
5  45 80%
6  47 76%
7  49 72%
8  54 68%
9  56 64%
   day   p
10  58 60%
11  59 56%
12  60 52%
13  61 48%
14  62 44%
15  68 40%
16  70 36%
17  71 32%
18  72 28%
   day   p
19  73 24%
20  75 20%
21  77 16%
22  79 12%
23  89  8%
24  94  4%
25  96  0%

First fruit fly experiment, 4

Second fruit fly experiment, 1

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, ??, ??, ??, ??, ??, ??, ??, ??, ??, ??

Second fruit fly experiment, 2

  day event
1  37     1
2  40     1
3  43     1
4  44     1
5  45     1
6  47     1
7  49     1
8  54     1
9  56     1
   day event
10  58     1
11  59     1
12  60     1
13  61     1
14  62     1
15  68     1
16  70     0
17  70     0
18  70     0
   day event
19  70     0
20  70     0
21  70     0
22  70     0
23  70     0
24  70     0
25  70     0

Second fruit fly experiment, 3

  day event   p
1  37     1 96%
2  40     1 92%
3  43     1 88%
4  44     1 84%
5  45     1 80%
6  47     1 76%
7  49     1 72%
8  54     1 68%
9  56     1 64%
   day event   p
10  58     1 60%
11  59     1 56%
12  60     1 52%
13  61     1 48%
14  62     1 44%
15  68     1 40%
16  70     0    
17  70     0    
18  70     0    
   day event p
19  70     0  
20  70     0  
21  70     0  
22  70     0  
23  70     0  
24  70     0  
25  70     0  

Second fruit fly experiment, 4

Third fruit fly experiment, 1

37, 40, 43, 44, 45, 47, 49, 54, 56, 58, 59, 60, 61, 62, 68, ??, 71, ??, ??, 75, ??, ??, 89, ??, 96

Third fruit fly experiment, 2

  day event
1  37     1
2  40     1
3  43     1
4  44     1
5  45     1
6  47     1
7  49     1
8  54     1
9  56     1
   day event
10  58     1
11  59     1
12  60     1
13  61     1
14  62     1
15  68     1
16  70     0
17  71     1
18  70     0
   day event
19  70     0
20  75     1
21  70     0
22  70     0
23  89     1
24  70     0
25  96     1

Third fruit fly experiment, 3

  day event   p
1  37     1 96%
2  40     1 92%
3  43     1 88%
4  44     1 84%
5  45     1 80%
6  47     1 76%
7  49     1 72%
8  54     1 68%
9  56     1 64%
   day event   p
10  58     1 60%
11  59     1 56%
12  60     1 52%
13  61     1 48%
14  62     1 44%
15  68     1 40%
16  70     0    
17  71     1 30%
18  70     0    
   day event   p
19  70     0    
20  75     1 20%
21  70     0    
22  70     0    
23  89     1 10%
24  70     0    
25  96     1  0%

Third fruit fly experiment, 4

Interpreting Kaplan-Meier plots, 1

Interpreting Kaplan-Meier plots, 2

Interpreting Kaplan-Meier plots, 3

Interpreting Kaplan-Meier plots, 4

Break #1

  • What you have learned
    • A simple example of survival data
  • What’s coming next
    • Overall Kaplan-Meier curve

Worcester Heart Attack Study, 1

data_dictionary: whas500.dat
description: The data represents survival times for a 500 patient subset of data from
  the Worcester Heart Attack Study. You can find more information about this data
  set in Chapter 1 of Hosmer, Lemeshow, and May.

Worcester Heart Attack Study, 2

id:
  label: a sequential code from 1 to 100
age:
  scale: ratio
  label: Age at Admission
  unit: years
gender:
  scale: binary
  value:
    Male: 0
    Female: 1

Worcester Heart Attack Study, 3

hr:
  scale: ratio
  label: Initial Heart Rate
  unit: Beats per minute
sysbp:
  scale: ratio
  label: Initial Systolic Blood Pressure
  unit: mmHg
diasbp:
  scale: ratio
  label: Initial Diastolic Blood Pressure
  unit: mmHg

Worcester Heart Attack Study, 4

bmi:
  scale: ratio
  label: Body Mass Index
  unit: kg/m^2
cvd:
  scale: binary
  label: History of Cardiovascular Disease
  value:
    'FALSE': 0
    'TRUE': 1

Worcester Heart Attack Study, 5

afb:
  scale: binary
  label: Atrial Fibrillation
  value:
    'FALSE': 0
    'TRUE': 1
sho:
  scale: binary
  label: Cardiogenic Shock
  value:
    'FALSE': 0
    'TRUE': 1

Worcester Heart Attack Study, 6

chf:
  scale: binary
  label: Congestive Heart Complications
  value:
    'FALSE': 0
    'TRUE': 1
av3:
  scale: binary
  label: Complete Heart Block
  value:
    'FALSE': 0
    'TRUE': 1

Worcester Heart Attack Study, 7

miord:
  scale: binary
  label: MI Order
  value:
    First: 0
    Recurrent: 1
mitype:
  scale: binary
  label: MI Type
  value:
    non Q-wave: 0
    Q-wave: 1

Worcester Heart Attack Study, 8

year:
  scale: ordinal
  label: Cohort Year
  value:
    yr1997: 1
    yr1999: 2
    yr2001: 3
admitdate:
  label: Admission Date
  format: mm/dd/yyyy
disdate:
  label: Hospital Discharge Date
  format: mm/dd/yyyy

Worcester Heart Attack Study, 9

fdate:
  label: Date of last Follow Up
  format: mm/dd/yyyy
los:
  scale: ratio
  label: Length of Hospital Stay
  unit: Days
dstat:
  scale: binary
  label: Discharge Status from Hospital
  value:
    Alive: 0
    Dead: 1

Worcester Heart Attack Study, 9

lenfol:
  scale: ratio
  label: Follow Up Time
  unit: days
fstat:
  scale: binary
  label: Vital Satus
  value:
    Alive: 0
    Dead: 1

Event count

# A tibble: 2 × 2
  fstat     n
  <dbl> <int>
1     0   285
2     1   215

Overall Kaplan-Meier curve

Live demo, Overall Kaplan-Meier curve

Break #2

  • What you have learned
    • Overall Kaplan-Meier curve
  • What’s coming next
    • The log rank test

Cox regression for gender

Call:
coxph(formula = Surv(lenfol, fstat) ~ gender, data = whas_4)

         coef exp(coef) se(coef)    z      p
gender 0.3417    1.4074   0.1526 2.24 0.0251

Likelihood ratio test=4.95  on 1 df, p=0.02606
n= 461, number of events= 176 

Covariate imbalance

# A tibble: 2 × 2
  gender age_mean
   <dbl>    <dbl>
1      0     65.9
2      1     74.0

Cox regression adjusted for age

Call:
coxph(formula = Surv(lenfol, fstat) ~ gender, data = whas_4)

         coef exp(coef) se(coef)    z      p
gender 0.3417    1.4074   0.1526 2.24 0.0251

Likelihood ratio test=4.95  on 1 df, p=0.02606
n= 461, number of events= 176 

Live demo, The log rank test

Break #3

  • What you have learned
    • The log rank test
  • What’s coming next
    • The hazard function

Life insurance example

Probabilities for ages 21 through 41

Probabilities for ages 95 through 99

Why are these probabilities not comparable?

  • Unequal time intervals
    • Fix by computing a rate
  • Non-uniform probabilities over the interval
    • Fix by looking at narrow interval
  • No adjustment for survivorship
    • Fix by dividing by survival probabilty

Hazard function, definition

  • \[h(t)=lim_{\Delta t \rightarrow 0}\frac{P[t \le T \le t+\Delta t]/\Delta t}{P[T \ge t]}\]

  • \[h(t)=\frac{f(t)}{S(t)}\]

    • where \(f\) is the density function, and
    • \(S\) is the survival function (\(S(t)=1-F(t)\))

Hazard function, example

Hazard function on a log scale

Break #4

  • What you have learned
    • The hazard function
  • What’s coming next
    • The Cox regression model

Mean ages for men and women

Unadjusted and adjusted Cox regression models for gender

Live demo, The Cox regression model

Break #5

  • What you have learned
    • The Cox regression model
  • What’s coming next
    • Assumptions and data management

Assumptions of the log rank test

  • Independence
    • From one patient to another
    • Of censoring mechanism

Assumptions of the Cox regression model

  • Independence
  • Proportional hazards assumption
  • Possible violations of proportional hazards
    • Survival curves that cross
    • One curve flattening out over time
    • Curves diverge only at later times

Survival curves that cross

One curve flattening out over time

Curves diverge only at later times

Sample size issues

  • Rule of 50
  • Rule of 15

Use ISO format for dates

Understand the internal storage system for dates

Date management

  • The three dates you need
    • the date of origin,
    • the date of the event (if it occurred),
    • the date of last contact with the patient.

The date of origin

  • Rehospitalization
    • use date of first discharge.
  • Failure of a mechanical device
    • use date of implant.
  • Divorce
    • use date of marriage.
  • Loan default
    • use date of loan contract.
  • Infectious disease
    • use date of first exposure.

The date of the event

  • Define your event precisely
    • All-cause mortality
    • Mortality related to the health condition
    • Composite endpoints (e.g., death or relapse)
      • Requires comparing the earlier of two dates.
  • If the event did NOT happen, leave this field blank/missing.

The date of last contact

  • If event did not occur
    • Must be specified
    • Typically last medical exam or last telephone contact
  • If event did occur
    • Make same as event date, or
    • Leave blank

Survival calculations, 1 of 2

  • Time = max(Date of event, Date of last contact) - Date of origin
  • Censoring variable = 0 if Date of event is missing, 1 if not

Survival calculations, 2 of 2

Summary

  • What you have learned
    • A simple example of survival data
    • Overall Kaplan-Meier curve
    • The log rank test
    • The hazard function
    • The Cox regression model
    • Assumptions and data management